Interrogating sex as a biological variable in preterm birth inequity
=======
Regional Analysis of Preterm Birth Characteristics
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
BMIN503/EPID600 Final Project
Author
Rose Albert
<<<<<<< HEAD
1 Interrogating sex as a biological variable in preterm birth inequity
=======
1 Interrogating sex as a biological variable in county-level preterm birth inequity
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
December 13, 2024
By Rose Albert
1.1 Overview
<<<<<<< HEAD
Preterm birth affects approximately 10% of all births, and there is a well-established male vulnerability and female resilience during the neonatal period. The goal of this project is to interrogate sex as a biological variable using county-level data from the National Vital Statistics System queried using CDC WONDER (Wide-Ranging Online Data for Epidemiologic Research). This project will assess preterm-birth rates and NICU admissions. I have spoken with two experts in perinatal health, Dr. Aimin Chen and Dr. Heather Burris, about biological and structural determinants contributing to preterm birth inequity, as well as potential areas of clinical, public health, and policy intervention.
=======
Preterm birth affects approximately 10% of all births, and there is a well-established male vulnerability and female resilience during the neonatal period. The goal of this project is to interrogate sex as a biological variable at the county-level using CDC Wonder Natality Data and identify sex-differences in preterm birth rates and NICU admission. I have spoken with Dr. Aimin Chen and Dr. Heather Burris about biological and structural determinants contributing to preterm birth inequity, as well as potential areas of clinical, public health, and policy intervention.
More than a third of infant deaths in the US are preterm-related (Callaghan et al., 2006). Preterm birth can predispose infants to immediate risks in the neonatal period as well as long-term adverse sequelae. For example, preterm infants are born with underdeveloped lungs and often require therapeutic interventions such as surfactant administration, antibiotics, steroids, mechanical ventilation, and supplemental oxygen for survival (Thebaud et al., 2019). However, these therapeutics can also pose postnatal insults and contribute to lung injury and chronic diseases such as bronchopulmonary dysplasia (Thebaud et al., 2019). Despite advances in neonatal care that have improved overall survival of preterm infants, rates of bronchopulmonary dysplasia have continued to rise (Bell et al., 2022). Bronchopulmonary dysplasia is one of many male-biased complications of prematurity (van Westering-Kroon et al. 2021). In addition to a male disadvantage in neonatal outcomes, there is also a male vulnerability for pregnancy complications and indications for preterm birth (Inkster et al., 2021).
Interrogation of sex as a biological variable has received increasing attention as a critical area for understanding basic pathophysiology and differential response to therapeutics (Albert, Lee, & Lingappan, 2023). The NIH has also established consideration of sex as a biological variable as a research mandate (“NOT-OD-15-102”, 2015). This project will investigate male susceptibility to preterm birth and NICU admission using publicly available county-level data. An understanding of preterm birth characteristics can inform neonatal care toward precision medicine approaches.
1.3 Methods
Data sources
County-level Birth Characteristics Data Source: I am using the CDC Wonder Natality Data (2023) grouped by “County of Residence,” “Sex of Infant,” “NICU Admission,” and “OE Gestational Age Recode 10,” with the measures “Births” and “Percent of Total Births.”
Mapping Data Source: I am drawing from Assignment 5 to use geospatial data, tidy census, and leaflet to generate interactive maps.update.packages(“tidycensus”)
Data curation and cleaning
To work with this dataset, I loaded the necessary libraries and merged the county population data with the birth data by “County.of.Residence.Code”. To clean the data, I am removing redundant columns and renaming my variables. I am also creating a new variable for preterm birth by grouping the gestational ages that are <37 weeks. I then merge this with my county shape files.
=======
Preterm birth can predispose infants to immediate risks in the neonatal period as well as long-term adverse sequelae. For example, preterm infants are born with underdeveloped lungs and often require therapeutic interventions such as surfactant administration, antibiotics, steroids, mechanical ventilation, and supplemental oxygen for survival. However, these therapeutics can also pose postnatal insults and contribute to lung injury and diseases such as bronchopulmonary dysplasia.
Interrogation of sex as a biological variable has received increasing attention as a critical area for understanding basic pathophysiology and differential response to therapeutics. Male sex is an independent risk factor for numerous pediatric diseases and diseases of prematurity. This project will investigate male susceptibility to preterm birth and NICU admission at the county-level. An understanding of preterm birth characteristics can inform local public health interventions.
1.3 Methods
Data curation and cleaning
County-level Birth Characteristics Data Source: I am using the CDC Wonder Natality Data (2023) grouped by “County of Residence,” “Sex of Infant,” “NICU Admission,” and “OE Gestational Age Recode 10,” with the measures “Births” and “Percent of Total Births.”
Mapping Data Source: I am drawing from Assignment 5 to use geospatial data, tidy census, and leaflet to generate interactive maps.update.packages(“tidycensus”)
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
#load necessary librarieslibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
library(tidycensus)library(leaflet)options(tigris_use_cache =TRUE)options(progress_enabled =FALSE)#Load county-level birth data downloaded from CDC Wonder
<<<<<<< HEAD
birth_data <-read.delim("https://raw.githubusercontent.com/rosemalbert/BMIN503_Final_Project/refs/heads/master/Natality%2C%202016-2023%20expanded_sex.txt")county_data <-read.delim("https://raw.githubusercontent.com/rosemalbert/BMIN503_Final_Project/refs/heads/master/Natality%2C%202016-2023%20expanded%20_county.txt")
=======
birth_data <-read.delim("https://raw.githubusercontent.com/rosemalbert/BMIN503_Final_Project/refs/heads/master/Natality%2C%202016-2023%20expanded_sex.txt")county_data <-read.delim("https://raw.githubusercontent.com/rosemalbert/BMIN503_Final_Project/refs/heads/master/Natality%2C%202016-2023%20expanded%20_county.txt")
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
#view column namescolnames(birth_data)
#merge birth_data and county_datamerged_data <-left_join(birth_data, county_data, by ="County.of.Residence.Code")
Warning in left_join(birth_data, county_data, by = "County.of.Residence.Code"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 11803 of `x` matches multiple rows in `y`.
ℹ Row 250 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
#merge birth_data and county_datamerged_data <-left_join(birth_data, county_data, by ="County.of.Residence.Code")
Warning in left_join(birth_data, county_data, by = "County.of.Residence.Code"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 11803 of `x` matches multiple rows in `y`.
ℹ Row 250 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
#clean merged_data library(dplyr)# clean merged data by removing unnecessary columns, checking discrepancies, and renamingcleaned_data <- merged_data %>%
<<<<<<< HEAD
# Remove 'Notes.x' columnselect(-Notes.x) %>%# Check for discrepancies between 'County.of.Residence.x' and 'County.of.Residence.y'
=======
# Remove 'Notes.x' columnselect(-Notes.x) %>%# Check for discrepancies between 'County.of.Residence.x' and 'County.of.Residence.y'
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
# mutate(County.of.Residence =ifelse( County.of.Residence.x == County.of.Residence.y, County.of.Residence.x, NA ) ) %>%
<<<<<<< HEAD
# Merge 'County.of.Residence.x' and 'County.of.Residence.y'select(-County.of.Residence.x, -County.of.Residence.y) %>%# Rename 'Births.y' to 'Total.Births'
=======
# Merge 'County.of.Residence.x' and 'County.of.Residence.y'select(-County.of.Residence.x, -County.of.Residence.y) %>%# Rename 'Births.y' to 'Total.Births'
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
rename(Total.Births = Births.y ) %>%
<<<<<<< HEAD
# Remove 'Notes.y' column
=======
# Remove 'Notes.y' column
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
select(-Notes.y) %>%# Remove sex of infant codeselect(-Sex.of.Infant.Code) %>%
<<<<<<< HEAD
# Remove 'X..of.Total.Births.y' and 'X..of.Total.Births.x'select(-c(X..of.Total.Births.x, X..of.Total.Births.y)) %>%# Remove 'OE.Gestational.Age.Recode.10.Code'select(-OE.Gestational.Age.Recode.10.Code) %>%# Remove 'NICU.Admission.Code'select(-NICU.Admission.Code) %>%# Filter out rows where 'NICU.Admission' or 'OE.Gestational.Age.Recode.10' are "Unknown"filter(!NICU.Admission %in%c("Unknown or Not Stated", ""),!OE.Gestational.Age.Recode.10%in%c("Unknown or Not Stated", "")
=======
# Remove 'X..of.Total.Births.y' and 'X..of.Total.Births.x'select(-c(X..of.Total.Births.x, X..of.Total.Births.y)) %>%# Remove 'OE.Gestational.Age.Recode.10.Code'select(-OE.Gestational.Age.Recode.10.Code) %>%# Remove 'NICU.Admission.Code'select(-NICU.Admission.Code) %>%# Filter out rows where 'NICU.Admission' or 'OE.Gestational.Age.Recode.10' are "Unknown"filter(!NICU.Admission %in%c("Unknown or Not Stated", ""),!OE.Gestational.Age.Recode.10%in%c("Unknown or Not Stated", "")
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
) %>%# Remove any duplicatesdistinct()cleaned_data <- cleaned_data %>%mutate(County.of.Residence.Code =as.character(County.of.Residence.Code))
<<<<<<< HEAD
# Convert 'County.of.Residence.Code' to character before mergingpreterm_rate_data <- cleaned_data %>%mutate(Is.Preterm = OE.Gestational.Age.Recode.10%in%c("28 - 31 weeks", "32 - 35 weeks", "36 weeks", "20 - 27 weeks", "Under 20 weeks"),County.of.Residence.Code =as.character(County.of.Residence.Code) # Convert to character ) %>%group_by(County.of.Residence, County.of.Residence.Code) %>%# Keep 'County.of.Residence.Code' in the group_bysummarise(Preterm.Births =sum(ifelse(Is.Preterm, Births.x, 0), na.rm =TRUE),Preterm.Birth.Rate = (Preterm.Births / Total.Births) *100 ) %>%ungroup()
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'County.of.Residence',
'County.of.Residence.Code'. You can override using the `.groups` argument.
#load census shapefile data and merge by county FIPs counties_sf <-counties(cb =TRUE, resolution ="20m") # cb = TRUE for a simplified geometry
Retrieving data for the year 2022
# Merge the cleaned data with the counties shapefilecounty_map_preterm_data <-left_join(counties_sf, preterm_rate_data, by =c("GEOID"="County.of.Residence.Code"))county_map_cleaned_data <-left_join(counties_sf, cleaned_data, by =c("GEOID"="County.of.Residence.Code"))# Ensure the 'Sex.of.Infant' column is available and that it's a factor or charactercleaned_data$Sex.of.Infant <-as.factor(cleaned_data$Sex.of.Infant)
To create a data frame to assess preterm birth rates, I stratified the preterm-birth data by sex and calculated the preterm birth rate for each county by dividing the number of preterm births by the total number of births in each county, multiplied by 100 to report as a percentage.
# Create a new data frame stratified by sexpreterm_sex_data <- cleaned_data %>%mutate(Is.Preterm = OE.Gestational.Age.Recode.10%in%c("28 - 31 weeks", "32 - 35 weeks", "36 weeks", "20 - 27 weeks", "Under 20 weeks"),County.of.Residence.Code =as.character(County.of.Residence.Code) # Convert to character ) %>%group_by(County.of.Residence, County.of.Residence.Code, Sex.of.Infant) %>%summarise(Preterm.Births =sum(ifelse(Is.Preterm, Births.x, 0), na.rm =TRUE),Preterm.Birth.Rate = (Preterm.Births / Total.Births) *100 ) %>%ungroup()
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'County.of.Residence',
'County.of.Residence.Code', 'Sex.of.Infant'. You can override using the
`.groups` argument.
Analysis plan
Exploratory analysis of data:
Generate county-level choropleth maps of birth rates ((births/total population)*1000 for each county).
Generate county-level choropleth maps of preterm birth rates (<37 gestational age births/total population).
Generate bar graphs showing sex-stratified distribution of births by gestational age groups.
Generate bar graphs showing sex-stratified distribution of NICU admission rates by gestational age groups.
Conduct logistic regression to identify relationships between gestational age and sex with the outcome of NICU admission
Additional data analysis: Identify counties with highest and lowest preterm birth rates, and the highest and lowest NICU admission rates.
1.4 Results
First, I developed county-level choropleth maps of birth rates (live births per 1000 people) for the 496 counties represented in this dataset. I used this output to identify the county with the highest birth rate (Rockland County, NY; 19.4) and the lowest birth rate (Charlotte County, FL; 4.99). The average birth rate was 10.81.
=======
# View cleaned datahead(cleaned_data)
County.of.Residence.Code Sex.of.Infant NICU.Admission
1 24033 Female No
2 24033 Male No
3 24037 Male Yes
4 24999 Female No
5 24999 Male Yes
6 25003 Female No
OE.Gestational.Age.Recode.10 Births.x Total.Births Total.Population
1 28 - 31 weeks 10 10527 947430
2 42 weeks or more 10 10527 947430
3 32 - 35 weeks 10 1309 115281
4 28 - 31 weeks 10 5546 550411
5 40 weeks 10 5546 550411
6 36 weeks 10 890 126818
Birth.Rate County.of.Residence
1 11.11 Prince George's County, MD
2 11.11 Prince George's County, MD
3 11.35 St. Mary's County, MD
4 10.08 Unidentified Counties, MD
5 10.08 Unidentified Counties, MD
6 7.02 Berkshire County, MA
# Convert 'County.of.Residence.Code' to character before mergingpreterm_rate_data <- cleaned_data %>%mutate(Is.Preterm = OE.Gestational.Age.Recode.10%in%c("28 - 31 weeks", "32 - 35 weeks", "36 weeks", "20 - 27 weeks", "Under 20 weeks"),County.of.Residence.Code =as.character(County.of.Residence.Code) # Convert to character ) %>%group_by(County.of.Residence, County.of.Residence.Code) %>%# Keep 'County.of.Residence.Code' in the group_bysummarise(Total.Births =sum(Total.Births, na.rm =TRUE),Preterm.Births =sum(ifelse(Is.Preterm, Births.x, 0), na.rm =TRUE),Preterm.Birth.Rate = (Preterm.Births / Total.Births) *100 ) %>%ungroup()
`summarise()` has grouped output by 'County.of.Residence'. You can override
using the `.groups` argument.
# View the result to ensure 'County.of.Residence.Code' is retainedhead(preterm_rate_data)
# A tibble: 6 × 5
County.of.Residence County.of.Residence.Code Total.Births Preterm.Births
<chr> <chr> <int> <dbl>
1 Ada County, ID 16001 122064 408
2 Adams County, CO 8001 139436 566
3 Adams County, PA 42001 9130 26
4 Aiken County, SC 45003 26698 169
5 Alachua County, FL 12001 48317 266
6 Alamance County, NC 37001 28320 158
# ℹ 1 more variable: Preterm.Birth.Rate <dbl>
#load census shapefile data and merge by county FIPs counties_sf <-counties(cb =TRUE, resolution ="20m") # cb = TRUE for a simplified geometry
Retrieving data for the year 2022
# Merge the cleaned data with the counties shapefilecounty_map_preterm_data <-left_join(counties_sf, preterm_rate_data, by =c("GEOID"="County.of.Residence.Code"))county_map_cleaned_data <-left_join(counties_sf, cleaned_data, by =c("GEOID"="County.of.Residence.Code"))# Integrate with US Census Data
Analysis plan
Exploratory analysis of data: Generate county-level choropleth maps of birth rates ((births/total population)*1000 for each county). Generate county-level choropleth maps of preterm birth rates (<37 gestational age births/total population). Generate sex-stratified maps of preterm birth rates. Generate sex-stratified choropleth maps of NICU admissions by county. Generate bar graphs showing NICU admission rates by sex and sex distribution of gestational age groups.
Data analysis: Conduct 2-way ANOVA to determine if sex as a biological variable is a significant predictor of NICU admission. Identify counties with highest and lowest preterm birth rates, and the highest and lowest NICU admission rates.
1.4 Results
First, I developed a county-level choropleth maps of birth rates (live births per 1000 people) for counties that did not have any missing data. The below map represents 496 counties. Rockland County, NY has the highest birth rate (19.4 per 1000 people) while Charlotte County, FL has the lowest birth rate (4.99 per 1000 people).
The following object is masked from 'package:lubridate':
=======
Attaching package: 'cowplot'
The following object is masked from 'package:lubridate':
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
stamp
library(nhanesA)library(haven)library(viridis)
Loading required package: viridisLite
# Convert Birth.Rate to numeric (if necessary)county_map_cleaned_data$Birth.Rate <-as.numeric(county_map_cleaned_data$Birth.Rate)# Recheck the structure of the columnstr(county_map_cleaned_data$Birth.Rate)
num [1:11636] NA NA NA NA NA ...
# Remove rows where Birth.Rate is NAcounty_map_cleaned_data <- county_map_cleaned_data %>%filter(!is.na(Birth.Rate))#load map thememap_theme <-function() {theme_minimal() +theme(axis.line =element_blank(), axis.text =element_blank(), axis.title =element_blank(),
<<<<<<< HEAD
panel.grid =element_line(color ="white"), legend.key.size =unit(0.8, "cm"),
=======
panel.grid =element_line(color ="white"), legend.key.size =unit(0.8, "cm"),
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
legend.text =element_text(size =16), legend.title =element_text(size =16) )}#create my palettemyPalette <-colorRampPalette(viridis(100))# Generate the choropleth map for birth ratesbirth_rates_map <-ggplot(data = county_map_cleaned_data) +geom_sf(aes(fill = Birth.Rate)) +# Map Preterm Birth Rate to the fillmap_theme() +# Apply the custom theme
<<<<<<< HEAD
ggtitle("2023 Birth Rate (live births per 1000 people)") +scale_fill_gradientn(name ="Birth Rate\nrate", colours =myPalette(100))
=======
ggtitle("2023 Birth Rate (live births per 1000 people)") +scale_fill_gradientn(name ="Birth Rate\nrate", colours =myPalette(100))
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
coord_sf(xlim =c(-79.5, -75), ylim =c(37, 39))
<ggproto object: Class CoordSf, CoordCartesian, Coord, gg>
aspect: function
backtransform_range: function
clip: on
crs: NULL
datum: crs
default: FALSE
default_crs: NULL
determine_crs: function
distance: function
expand: TRUE
fixup_graticule_labels: function
get_default_crs: function
is_free: function
is_linear: function
label_axes: list
label_graticule:
labels: function
limits: list
lims_method: cross
modify_scales: function
ndiscr: 100
params: list
range: function
record_bbox: function
render_axis_h: function
render_axis_v: function
render_bg: function
render_fg: function
setup_data: function
setup_layout: function
setup_panel_guides: function
setup_panel_params: function
setup_params: function
train_panel_guides: function
transform: function
super: <ggproto object: Class CoordSf, CoordCartesian, Coord, gg>
# Print the mapprint(birth_rates_map)
<<<<<<< HEAD
=======
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
#How many counties are represented in this analysis?num_unique_geoid_counties <-length(unique(county_map_cleaned_data$GEOID))print(num_unique_geoid_counties)
[1] 496
#Which counties have the highest and lowest birth rates?# Find the county with the highest Birth Rate and its value (show only the first entry)highest_birth_rate <- county_map_cleaned_data %>%filter(Birth.Rate ==max(Birth.Rate, na.rm =TRUE)) %>%slice(1) # Select the first row
<<<<<<< HEAD
cat("County with the highest Birth Rate: ", highest_birth_rate$County.of.Residence, "\n")
County with the highest Birth Rate: Rockland County, NY
# Find the county with the lowest Birth Rate and its value (show only the first entry)
<<<<<<< HEAD
lowest_birth_rate <- county_map_cleaned_data %>%filter(Birth.Rate ==min(Birth.Rate, na.rm =TRUE)) %>%slice(1) # Select the first rowcat("County with the lowest Birth Rate: ", lowest_birth_rate$County.of.Residence, "\n")
County with the lowest Birth Rate: Charlotte County, FL
# Calculate the average Birth Rateaverage_birth_rate <- county_map_cleaned_data %>%summarise(Average_Birth_Rate =mean(Birth.Rate, na.rm =TRUE))cat("Average Birth Rate: ", average_birth_rate$Average_Birth_Rate, "\n")
Average Birth Rate: 10.81442
To better visualize and explore the birth rates, I then generated an interactive choropleth map of birth rates by county. This made it easier to identify regional trends and easily navigate to counties of interest such as Philadelphia County. Philadelphia County has a birth rate above average at 11.8 live births per 1000 people. My hometown in Hamilton County, TN has a similar birth rate at 11.9 live births per 1000 people.
#generate interactive choropleth map library(leaflet)library(RColorBrewer)# Define the color scale for the Birth Rate using colorBinpal_fun <-colorBin(palette =brewer.pal(9, "YlOrRd")[c(1:5, 7)], bins =c(0, 5, 10, 15, 20, 25), reverse =FALSE)# Create the popup message for each countypu_message <-paste0(county_map_cleaned_data$County.of.Residence, "<br>Birth Rate: ", round(county_map_cleaned_data$Birth.Rate, 1), " live births per 1000 people")# Create the leaflet map with the polygonsleaflet(county_map_cleaned_data) |>addPolygons(stroke =FALSE, fillColor =~pal_fun(Birth.Rate),fillOpacity =0.7, smoothFactor =0.5,popup = pu_message) |>addProviderTiles(providers$CartoDB.Positron) |>addLegend("bottomright", pal = pal_fun, values =~Birth.Rate, title ='Birth Rate', opacity =1) |>addScaleBar()
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
I repeated the analysis to generate county-level choropleth maps of preterm birth rates ((<37 gestational age births/total number of live births)*100). The average preterm birth rate of the counties in this dataset is 9.5%. Hinds County, MS has the highest preterm birth rate (17.85%) while Tompkins County, NY has the lowest preterm birth rate (2.63%).
# Convert Preterm.Birth.Rate to numericcounty_map_preterm_data$Preterm.Birth.Rate <-as.numeric(county_map_preterm_data$Preterm.Birth.Rate)# Recheck the structure of the columnstr(county_map_preterm_data$Preterm.Birth.Rate)
num [1:11636] NA NA NA NA NA ...
# Remove rows where Birth.Rate is NAcounty_map_preterm_data <- county_map_preterm_data %>%filter(!is.na(Preterm.Birth.Rate))# Generate the choropleth map for preterm birth ratespreterm_birth_rates_map <-ggplot(data = county_map_preterm_data) +geom_sf(aes(fill = Preterm.Birth.Rate)) +# Map Preterm Birth Rate to the fillmap_theme() +# Apply the custom themeggtitle("2023 Preterm Birth Rate % (<37 gestational age births/total number of live births)") +scale_fill_gradientn(name ="Preterm Birth Rate\nrate", colours =myPalette(100))coord_sf(xlim =c(-79.5, -75), ylim =c(37, 39))
=======
lowest_birth_rate <- county_map_cleaned_data %>%filter(Birth.Rate ==min(Birth.Rate, na.rm =TRUE)) %>%slice(1) # Select the first rowcat("County with the lowest Birth Rate: ", lowest_birth_rate$County.of.Residence, "\n")
County with the lowest Birth Rate: Charlotte County, FL
To better visualize and explore the birth rates, I then generated an interactive choropleth map of birth rates by county.
#generate interactive choropleth map library(leaflet)library(RColorBrewer)# Define the color scale for the Birth Rate using colorBinpal_fun <-colorBin(palette =brewer.pal(9, "YlOrRd")[c(1:5, 7)], bins =c(0, 5, 10, 15, 20, 25), reverse =FALSE)# Create the popup message for each countypu_message <-paste0(county_map_cleaned_data$County.of.Residence, "<br>Birth Rate: ", round(county_map_cleaned_data$Birth.Rate, 1), " live births per 1000 people")# Create the leaflet map with the polygonsleaflet(county_map_cleaned_data) |>addPolygons(stroke =FALSE, fillColor =~pal_fun(Birth.Rate),fillOpacity =0.7, smoothFactor =0.5,popup = pu_message) |>addProviderTiles(providers$CartoDB.Positron) |>addLegend("bottomright", pal = pal_fun, values =~Birth.Rate, title ='Birth Rate', opacity =1) |>addScaleBar()
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
I repeated the analysis to generate county-level choropleth maps of preterm birth rates (<37 gestational age births/total number of live births).
# Convert Preterm.Birth.Rate to numericcounty_map_preterm_data$Preterm.Birth.Rate <-as.numeric(county_map_preterm_data$Preterm.Birth.Rate)# Recheck the structure of the columnstr(county_map_preterm_data$Preterm.Birth.Rate)
num [1:3222] NA NA NA NA NA ...
# Remove rows where Birth.Rate is NAcounty_map_preterm_data <- county_map_preterm_data %>%filter(!is.na(Preterm.Birth.Rate))# Generate the choropleth map for preterm birth ratespretetrm_birth_rates_map <-ggplot(data = county_map_preterm_data) +geom_sf(aes(fill = Preterm.Birth.Rate)) +# Map Preterm Birth Rate to the fillmap_theme() +# Apply the custom themeggtitle("2023 Preterm Birth Rate % (<37 gestational age births/total number of live births)") +scale_fill_gradientn(name ="Preterm Birth Rate\nrate", colours =myPalette(100))coord_sf(xlim =c(-79.5, -75), ylim =c(37, 39))
>>>>>>> f88472e3a542e8cb88557f16a8d613ce8a791472
<ggproto object: Class CoordSf, CoordCartesian, Coord, gg>
aspect: function
backtransform_range: function
clip: on
crs: NULL
datum: crs
default: FALSE
default_crs: NULL
determine_crs: function
distance: function
expand: TRUE
fixup_graticule_labels: function
get_default_crs: function
is_free: function
is_linear: function
label_axes: list
label_graticule:
labels: function
limits: list
lims_method: cross
modify_scales: function
ndiscr: 100
params: list
range: function
record_bbox: function
render_axis_h: function
render_axis_v: function
render_bg: function
render_fg: function
setup_data: function
setup_layout: function
setup_panel_guides: function
setup_panel_params: function
setup_params: function
train_panel_guides: function
transform: function
super: <ggproto object: Class CoordSf, CoordCartesian, Coord, gg>
<<<<<<< HEAD
preterm_birth_rates_map
#How many counties are represented in this analysis?num_unique_geoid_counties_preterm <-length(unique(county_map_preterm_data$GEOID))print(num_unique_geoid_counties_preterm)
[1] 496
#Which counties have the highest and lowest preterm birth rates?# Find the county with the highest preterm Birth Rate and its value (show only the first entry)highest_preterm_birth_rate <- county_map_preterm_data %>%filter(Preterm.Birth.Rate ==max(Preterm.Birth.Rate, na.rm =TRUE)) %>%slice(1) # Select the first rowcat("County with the highest preterm Birth Rate: ", highest_preterm_birth_rate$County.of.Residence, "\n")
County with the highest preterm Birth Rate: Hinds County, MS
# Find the county with the lowest preterm Birth Rate and its value (show only the first entry)lowest_preterm_birth_rate <- county_map_preterm_data %>%filter(Preterm.Birth.Rate ==min(Preterm.Birth.Rate, na.rm =TRUE)) %>%slice(1) # Select the first rowcat("County with the lowest preterm Birth Rate: ", lowest_preterm_birth_rate$County.of.Residence, "\n")
County with the lowest preterm Birth Rate: Tompkins County, NY
#What is the average preterm birth rate? # Calculate the average preterm birth rateaverage_preterm_birth_rate <-mean(county_map_preterm_data$Preterm.Birth.Rate, na.rm =TRUE)cat("Average preterm birth rate: ", round(average_preterm_birth_rate, 2), "\n")
Average preterm birth rate: 9.5
I now want a similarly interactive map of preterm birth data. From this, I can identify that Philadelphia has an above average preterm birth rate at 11%. My hometown in Hamilton County, TN has a similar preterm birth rate at 10.6%.
#generate interactive choropleth map library(leaflet)library(RColorBrewer)# Define the color scale for the Birth Rate using colorBinpal_fun <-colorBin(palette =brewer.pal(9, "YlOrRd")[c(1:5, 7)], bins =c(0, 5, 10, 15, 20, 25), reverse =FALSE)county_map_preterm_data$Preterm.Birth.Rate <-as.numeric(county_map_preterm_data$Preterm.Birth.Rate)# Create the popup message for each countypu_message <-paste0(county_map_preterm_data$County.of.Residence, "<br>Preterm Birth Rate: ", round(county_map_preterm_data$Preterm.Birth.Rate, 1), "%")# Create the leaflet map with the polygonsleaflet(county_map_preterm_data) |>addPolygons(stroke =FALSE, fillColor =~pal_fun(Preterm.Birth.Rate),fillOpacity =0.7, smoothFactor =0.5,popup = pu_message) |>addProviderTiles(providers$CartoDB.Positron) |>addLegend("bottomright", pal = pal_fun, values =~Preterm.Birth.Rate, title ='Preterm Birth Rate', opacity =1) |>addScaleBar()
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
I now want to interrogate the total number of births by gestational age group and visualize the distribution using a bar graph. I can see that most of the infants are born at 32-35 weeks and 37-39 weeks gestational age. The <20 week gestational age group is least represented.
# Summarize the data by gestational age for total birthsgestational_births_summary <- cleaned_data %>%group_by(OE.Gestational.Age.Recode.10) %>%summarise(Total_Births =n() # Count the total number of births in each gestational age category ) %>%ungroup()# Plot the total number of births by gestational ageggplot(gestational_births_summary, aes(x = OE.Gestational.Age.Recode.10, y = Total_Births, fill = OE.Gestational.Age.Recode.10)) +geom_bar(stat ="identity", show.legend =FALSE) +labs(title ="Total Number of Births by Gestational Age",x ="Gestational Age at Birth",y ="Total Number of Births" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Now I want to explore the proportion of NICU admission by gestational age. We can see a general trend where lower gestational ages have a higher proportion of NICU admission. Notably, infants less than 20 weeks old do not follow this trend, though this may represent survivor bias (infants born less than 20 weeks old may not survive for very long after birth or are inadequately powered to be assessed in this analysis). Infants born 20-31 weeks gestational age had the highest proportion of NICU admission.
# Summarizing the data by gestational age and NICU admission statusgestational_nicu_summary <- cleaned_data %>%group_by(OE.Gestational.Age.Recode.10) %>%summarise(Total_Births =n(), # Count the total number of births in each gestational age category (both Yes and No)NICU_Admissions =sum(NICU.Admission =="Yes", na.rm =TRUE), # Count NICU admissionsProportion_NICU_Admission = NICU_Admissions / Total_Births # Calculate proportion of NICU admissions ) %>%ungroup()# Now, plot the proportion of NICU admissions by gestational agelibrary(ggplot2)ggplot(gestational_nicu_summary, aes(x = OE.Gestational.Age.Recode.10, y = Proportion_NICU_Admission, fill = OE.Gestational.Age.Recode.10)) +geom_bar(stat ="identity", show.legend =FALSE) +labs(title ="Proportion of NICU Admissions by Gestational Age",x ="Gestational Age at Birth",y ="Proportion of NICU Admissions" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
I repeated this analysis stratified by sex, first by visualizing the number of births per gestational age.
gestational_nicu_sex_summary <- cleaned_data %>%group_by(OE.Gestational.Age.Recode.10, Sex.of.Infant) %>%summarise(Total_Births =n() # Count the total number of infants for each gestational age and sex ) %>%ungroup()
`summarise()` has grouped output by 'OE.Gestational.Age.Recode.10'. You can
override using the `.groups` argument.
# Print the summarized data to checkprint(gestational_nicu_sex_summary)
# A tibble: 18 × 3
OE.Gestational.Age.Recode.10 Sex.of.Infant Total_Births
<chr> <fct> <int>
1 20 - 27 weeks Female 278
2 20 - 27 weeks Male 315
3 28 - 31 weeks Female 399
4 28 - 31 weeks Male 430
5 32 - 35 weeks Female 1102
6 32 - 35 weeks Male 1127
7 36 weeks Female 957
8 36 weeks Male 1042
9 37 - 39 weeks Female 1221
10 37 - 39 weeks Male 1235
11 40 weeks Female 892
12 40 weeks Male 937
13 41 weeks Female 720
14 41 weeks Male 745
15 42 weeks or more Female 110
16 42 weeks or more Male 134
17 Under 20 weeks Female 3
18 Under 20 weeks Male 6
library(ggplot2)ggplot(gestational_nicu_sex_summary, aes(x = OE.Gestational.Age.Recode.10, y = Total_Births, fill = Sex.of.Infant)) +geom_bar(stat ="identity", position ="dodge") +# Use 'dodge' for separate bars for each sexlabs(title ="Number of Infants Born by Gestational Age and Sex",x ="Gestational Age at Birth",y ="Total Number of Infants" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
I then stratified this data by sex to visualize gestational age and proportion of NICU admissions.
gestational_nicu_sex_summary <- cleaned_data %>%group_by(OE.Gestational.Age.Recode.10, Sex.of.Infant) %>%summarise(Total_Births =n(), # Count the total number of births in each gestational age category and sexNICU_Admissions =sum(NICU.Admission =="Yes", na.rm =TRUE), # Count NICU admissionsProportion_NICU_Admission = NICU_Admissions / Total_Births # Calculate proportion of NICU admissions ) %>%ungroup()
`summarise()` has grouped output by 'OE.Gestational.Age.Recode.10'. You can
override using the `.groups` argument.
# Print the summarized data to checkprint(gestational_nicu_sex_summary)
# A tibble: 18 × 5
OE.Gestational.Age.Recode.10 Sex.of.Infant Total_Births NICU_Admissions
<chr> <fct> <int> <int>
1 20 - 27 weeks Female 278 226
2 20 - 27 weeks Male 315 258
3 28 - 31 weeks Female 399 360
4 28 - 31 weeks Male 430 391
5 32 - 35 weeks Female 1102 609
6 32 - 35 weeks Male 1127 618
7 36 weeks Female 957 337
8 36 weeks Male 1042 425
9 37 - 39 weeks Female 1221 595
10 37 - 39 weeks Male 1235 609
11 40 weeks Female 892 266
12 40 weeks Male 937 311
13 41 weeks Female 720 108
14 41 weeks Male 745 140
15 42 weeks or more Female 110 0
16 42 weeks or more Male 134 1
17 Under 20 weeks Female 3 0
18 Under 20 weeks Male 6 0
# ℹ 1 more variable: Proportion_NICU_Admission <dbl>
library(ggplot2)ggplot(gestational_nicu_sex_summary, aes(x = OE.Gestational.Age.Recode.10, y = Proportion_NICU_Admission, fill = Sex.of.Infant)) +geom_bar(stat ="identity", position ="dodge") +# Use 'dodge' for separate bars for each sexlabs(title ="Proportion of NICU Admissions by Gestational Age and Sex",x ="Gestational Age at Birth",y ="Proportion of NICU Admissions" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
I then ran a logistic regression with gestational age only. Each variable was significant except for gestational age <20 weeks. The <20 weeks age group also has a high standard error (108.2). Moving forward, this will be excluded in the analysis.
# Convert NICU.Admission to binary (1 for "Yes", 0 for "No") cleaned_data$NICU.Admission <-ifelse(cleaned_data$NICU.Admission =="Yes", 1, 0)# Fit the logistic regression model again logistic_model <-glm(NICU.Admission ~ OE.Gestational.Age.Recode.10, data = cleaned_data, family ="binomial")# Summary of the logistic regression modelsummary(logistic_model)
Call:
glm(formula = NICU.Admission ~ OE.Gestational.Age.Recode.10,
family = "binomial", data = cleaned_data)
Coefficients:
Estimate Std. Error z value
(Intercept) 1.4907 0.1060 14.061
OE.Gestational.Age.Recode.1028 - 31 weeks 0.7740 0.1594 4.857
OE.Gestational.Age.Recode.1032 - 35 weeks -1.2882 0.1143 -11.275
OE.Gestational.Age.Recode.1036 weeks -1.9752 0.1156 -17.088
OE.Gestational.Age.Recode.1037 - 39 weeks -1.5298 0.1134 -13.485
OE.Gestational.Age.Recode.1040 weeks -2.2654 0.1174 -19.304
OE.Gestational.Age.Recode.1041 weeks -3.0815 0.1269 -24.289
OE.Gestational.Age.Recode.1042 weeks or more -6.9838 1.0076 -6.931
OE.Gestational.Age.Recode.10Under 20 weeks -14.0568 108.2480 -0.130
Pr(>|z|)
(Intercept) < 2e-16 ***
OE.Gestational.Age.Recode.1028 - 31 weeks 1.19e-06 ***
OE.Gestational.Age.Recode.1032 - 35 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1036 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1037 - 39 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1040 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1041 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1042 weeks or more 4.19e-12 ***
OE.Gestational.Age.Recode.10Under 20 weeks 0.897
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 16042 on 11652 degrees of freedom
Residual deviance: 13837 on 11644 degrees of freedom
AIC: 13855
Number of Fisher Scoring iterations: 11
I am repeating the analysis to exclude <20 weeks gestational age.
cleaned_data$OE.Gestational.Age.Recode.10<-factor(cleaned_data$OE.Gestational.Age.Recode.10, levels =c("40 weeks","20 - 27 weeks","28 - 31 weeks", "32 - 35 weeks", "36 weeks", "37 - 39 weeks", "41 weeks", "42 weeks or more"))# Fit the logistic regression model again logistic_model <-glm(NICU.Admission ~ OE.Gestational.Age.Recode.10, data = cleaned_data, family ="binomial")# Summary of the logistic regression modelsummary(logistic_model)
Call:
glm(formula = NICU.Admission ~ OE.Gestational.Age.Recode.10,
family = "binomial", data = cleaned_data)
Coefficients:
Estimate Std. Error z value
(Intercept) -0.77466 0.05032 -15.395
OE.Gestational.Age.Recode.1020 - 27 weeks 2.26539 0.11736 19.304
OE.Gestational.Age.Recode.1028 - 31 weeks 3.03935 0.12917 23.531
OE.Gestational.Age.Recode.1032 - 35 weeks 0.97723 0.06592 14.826
OE.Gestational.Age.Recode.1036 weeks 0.29016 0.06821 4.254
OE.Gestational.Age.Recode.1037 - 39 weeks 0.73556 0.06451 11.403
OE.Gestational.Age.Recode.1041 weeks -0.81606 0.08594 -9.496
OE.Gestational.Age.Recode.1042 weeks or more -4.71840 1.00135 -4.712
Pr(>|z|)
(Intercept) < 2e-16 ***
OE.Gestational.Age.Recode.1020 - 27 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1028 - 31 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1032 - 35 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1036 weeks 2.10e-05 ***
OE.Gestational.Age.Recode.1037 - 39 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1041 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1042 weeks or more 2.45e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 16031 on 11643 degrees of freedom
Residual deviance: 13837 on 11636 degrees of freedom
(9 observations deleted due to missingness)
AIC: 13853
Number of Fisher Scoring iterations: 7
Using these data, I then calculated odds ratios for gestational age on the outcome of NICU admission.
# Extract model coefficientscoefficients <-summary(logistic_model)$coefficients# Calculate odds ratios (exponentiate the coefficients)odds_ratios <-exp(coefficients[, "Estimate"])lower_ci <-exp(coefficients[, "Estimate"] -1.96* coefficients[, "Std. Error"])upper_ci <-exp(coefficients[, "Estimate"] +1.96* coefficients[, "Std. Error"])# Add confidence intervals to the data frameodds_ratios_df <-data.frame(Gestational_Age =rownames(coefficients),Odds_Ratio = odds_ratios,Lower_CI = lower_ci,Upper_CI = upper_ci)# Remove the intercept rowodds_ratios_df <- odds_ratios_df[odds_ratios_df$Gestational_Age !="(Intercept)", ]print(odds_ratios_df)
library(ggplot2)ggplot(odds_ratios_df, aes(x =reorder(Gestational_Age, Odds_Ratio), y = Odds_Ratio)) +geom_pointrange(aes(ymin = Lower_CI, ymax = Upper_CI), color ="blue", shape =16) +# Add points with error barscoord_flip() +# Flip axes for readabilitygeom_hline(yintercept =1, linetype ="dashed", color ="red") +# Reference line at OR = 1labs(title ="Odds Ratios for NICU Admission",x ="Variable",y ="Odds Ratio (OR)" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Improve label readability
Logistic regression with gestational age and sex
# Fit the logistic regression model with Gestational age and Sex of infantlogistic_model <-glm(NICU.Admission ~ OE.Gestational.Age.Recode.10+ Sex.of.Infant, data = cleaned_data, family ="binomial")# Show the model summarysummary(logistic_model)
Call:
glm(formula = NICU.Admission ~ OE.Gestational.Age.Recode.10 +
Sex.of.Infant, family = "binomial", data = cleaned_data)
Coefficients:
Estimate Std. Error z value
(Intercept) -0.82738 0.05466 -15.136
OE.Gestational.Age.Recode.1020 - 27 weeks 2.26476 0.11738 19.294
OE.Gestational.Age.Recode.1028 - 31 weeks 3.04023 0.12919 23.534
OE.Gestational.Age.Recode.1032 - 35 weeks 0.97852 0.06594 14.840
OE.Gestational.Age.Recode.1036 weeks 0.28941 0.06823 4.242
OE.Gestational.Age.Recode.1037 - 39 weeks 0.73698 0.06453 11.421
OE.Gestational.Age.Recode.1041 weeks -0.81606 0.08596 -9.494
OE.Gestational.Age.Recode.1042 weeks or more -4.72295 1.00135 -4.717
Sex.of.InfantMale 0.10198 0.04087 2.495
Pr(>|z|)
(Intercept) < 2e-16 ***
OE.Gestational.Age.Recode.1020 - 27 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1028 - 31 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1032 - 35 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1036 weeks 2.22e-05 ***
OE.Gestational.Age.Recode.1037 - 39 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1041 weeks < 2e-16 ***
OE.Gestational.Age.Recode.1042 weeks or more 2.40e-06 ***
Sex.of.InfantMale 0.0126 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 16031 on 11643 degrees of freedom
Residual deviance: 13831 on 11635 degrees of freedom
(9 observations deleted due to missingness)
AIC: 13849
Number of Fisher Scoring iterations: 7
# Extract model coefficientscoefficients <-summary(logistic_model)$coefficients# Calculate odds ratios (exponentiate the coefficients)odds_ratios <-exp(coefficients[, "Estimate"])lower_ci <-exp(coefficients[, "Estimate"] -1.96* coefficients[, "Std. Error"])upper_ci <-exp(coefficients[, "Estimate"] +1.96* coefficients[, "Std. Error"])# Create a data frame of the odds ratiosodds_ratios_df <-data.frame(Variable =rownames(coefficients),Odds_Ratio = odds_ratios,Lower_CI = lower_ci,Upper_CI = upper_ci)# Remove the intercept rowodds_ratios_df <- odds_ratios_df[odds_ratios_df$Variable !="(Intercept)", ]# Print the odds ratios data frameprint(odds_ratios_df)
library(ggplot2)ggplot(odds_ratios_df, aes(x =reorder(Variable, Odds_Ratio), y = Odds_Ratio)) +geom_pointrange(aes(ymin = Lower_CI, ymax = Upper_CI), color ="blue", shape =16) +# Add points with error barscoord_flip() +# Flip axes for readabilitygeom_hline(yintercept =1, linetype ="dashed", color ="red") +# Reference line at OR = 1labs(title ="Odds Ratios for NICU Admission",x ="Variable",y ="Odds Ratio (OR)" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Improve label readability
Are males more likely to be preterm?
# Create binary variable for preterm (1) vs term (0)cleaned_data$Preterm_Binary <-ifelse(cleaned_data$OE.Gestational.Age.Recode.10%in%c("Under 20 weeks", "20 - 27 weeks", "28 - 31 weeks", "32 - 35 weeks", "36 weeks"), 1, 0)# Logistic regression to test if males are more likely to be pretermlogistic_model <-glm(Preterm_Binary ~ Sex.of.Infant, data = cleaned_data, family ="binomial")summary(logistic_model)
Call:
glm(formula = Preterm_Binary ~ Sex.of.Infant, family = "binomial",
data = cleaned_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.07395 0.02655 -2.785 0.00535 **
Sex.of.InfantMale 0.02604 0.03708 0.702 0.48249
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 16144 on 11652 degrees of freedom
Residual deviance: 16143 on 11651 degrees of freedom
AIC: 16147
Number of Fisher Scoring iterations: 3
I want to visualize the number of infants born term and preterm stratified by sex.
library(ggplot2)# Summarize the data by Sex and Preterm statussummary_data <- cleaned_data %>%group_by(Sex.of.Infant, Preterm_Binary) %>%summarize(Count =n(), .groups ="drop")# Map Preterm_Binary to labels for the plotsummary_data$Preterm_Label <-ifelse(summary_data$Preterm_Binary ==1, "Preterm", "Term")# Create the bar plot with Gestational Age on the x-axisggplot(summary_data, aes(x = Preterm_Label, y = Count, fill = Sex.of.Infant)) +geom_bar(stat ="identity", position ="dodge") +# Dodge for side-by-side barslabs(title ="Number of Infants Born Term and Preterm Stratified by Sex",x ="Gestational Age",y ="Number of Infants",fill ="Sex of Infant" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1), # Angle x-axis labels for better readabilitylegend.position ="top" )
Are males more likely to be born extremely preterm (<28 wks)?
# Create a binary variable for extremely preterm (<28 weeks)cleaned_data$Extremely_Preterm <-ifelse(cleaned_data$OE.Gestational.Age.Recode.10=="Under 20 weeks"| cleaned_data$OE.Gestational.Age.Recode.10=="20 - 27 weeks", 1, 0)# Fit a logistic regression modelmodel <-glm(Extremely_Preterm ~ Sex.of.Infant, data = cleaned_data, family = binomial)# Summary of the modelsummary(model)
Call:
glm(formula = Extremely_Preterm ~ Sex.of.Infant, family = binomial,
data = cleaned_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.96672 0.06150 -48.242 <2e-16 ***
Sex.of.InfantMale 0.07988 0.08446 0.946 0.344
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4686.4 on 11643 degrees of freedom
Residual deviance: 4685.5 on 11642 degrees of freedom
(9 observations deleted due to missingness)
AIC: 4689.5
Number of Fisher Scoring iterations: 5
1.5 Discussion
Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the Section 1.3 as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.
1.6 Conclusion
This the conclusion. The Section 1.4 can be invoked here.
1.7 References
Bell, E.F., et al., Mortality, in-hospital morbidity, care practices, and 2-year outcomes for extremely preterm infants in the US, 2013-2018. JAMA, 2022. 327(3): p. 248-263.
Callaghan WM, MacDorman MF, Rasmussen SA, Qin C, Lackritz EM. The contribution of preterm birth to infant mortality rates in the United States. Pediatrics. 2006;118(4):1566-1573.
Thebaud, B., et al., Bronchopulmonary dysplasia. Nat Rev Dis Primers, 2019. 5(1): p. 78.
van Westering-Kroon, E., et al., Male Disadvantage in Oxidative Stress-Associated Complications of Prematurity: A Systematic Review, Meta-Analysis and Meta-Regression. Antioxidants (Basel), 2021. 10(9).
Inkster AM, Fernández-Boyano I, Robinson WP. Sex differences are here to stay: relevance to prenatal care. Journal of Clinical Medicine. 2021 Jul 5;10(13):3000.
NOT-OD-15-102: Consideration of Sex as a Biological Variable in NIH-Funded Research. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-102.html. Accessed 10 Dec. 2024.
Albert, R., Lee, A., Lingappan, K. Response to Therapeutic Interventions in the NICU: Role of Sex as a Biological Variable. 2023. Neoreviews, 24(12), e797-e805.
=======
pretetrm_birth_rates_map
#How many counties are represented in this analysis?num_unique_geoid_counties_preterm <-length(unique(county_map_preterm_data$GEOID))print(num_unique_geoid_counties_preterm)
[1] 496
#Which counties have the highest and lowest preterm birth rates?# Find the county with the highest preterm Birth Rate and its value (show only the first entry)highest_preterm_birth_rate <- county_map_preterm_data %>%filter(Preterm.Birth.Rate ==max(Preterm.Birth.Rate, na.rm =TRUE)) %>%slice(1) # Select the first rowcat("County with the highest preterm Birth Rate: ", highest_preterm_birth_rate$County.of.Residence, "\n")
County with the highest preterm Birth Rate: Kanawha County, WV
# Find the county with the lowest preterm Birth Rate and its value (show only the first entry)lowest_preterm_birth_rate <- county_map_preterm_data %>%filter(Preterm.Birth.Rate ==min(Preterm.Birth.Rate, na.rm =TRUE)) %>%slice(1) # Select the first rowcat("County with the lowest preterm Birth Rate: ", lowest_preterm_birth_rate$County.of.Residence, "\n")
County with the lowest preterm Birth Rate: Ocean County, NJ
Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the Section 1.3 as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.
1.5 Conclusion
This the conclusion. The Section 1.4 can be invoked here.